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Abstract 

Background: The inference of gene regulatory networks (GRNs) from large-scale expression profiles is one of the 
most challenging problems of Systems Biology nowadays. Many techniques and models have been proposed for 
this task. However, it is not generally possible to recover the original topology with great accuracy, mainly due to 
the short time series data in face of the high complexity of the networks and the intrinsic noise of the expression 
measurements. In order to improve the accuracy of GRNs inference methods based on entropy (mutual 
information), a new criterion function is here proposed. 

Results: In this paper we introduce the use of generalized entropy proposed by Tsallis, for the inference of GRNs 
from time series expression profiles. The inference process is based on a feature selection approach and the 
conditional entropy is applied as criterion function. In order to assess the proposed methodology, the algorithm is 
applied to recover the network topology from temporal expressions generated by an artificial gene network (AGN) 
model as well as from the DREAM challenge. The adopted AGN is based on theoretical models of complex 
networks and its gene transference function is obtained from random drawing on the set of possible Boolean 
functions, thus creating its dynamics. On the other hand, DREAM time series data presents variation of network size 
and its topologies are based on real networks. The dynamics are generated by continuous differential equations 
with noise and perturbation. By adopting both data sources, it is possible to estimate the average quality of the 
inference with respect to different network topologies, transfer functions and network sizes. 

Conclusions: A remarkable improvement of accuracy was observed in the experimental results by reducing the 
number of false connections in the inferred topology by the non-Shannon entropy. The obtained best free 
parameter of the Tsallis entropy was on average in the range 2.5 < q < 3.5 (hence, subextensive entropy), which 
opens new perspectives for GRNs inference methods based on information theory and for investigation of the 
nonextensivity of such networks. The inference algorithm and criterion function proposed here were implemented 
and included in the DimReduction software, which is freely available at http://sourceforge.net/projects/ 
dimreduction and http://code.google.eom/p/dimreduction/. 



1 Background 

In general, living organisms can be viewed as net-works 
of molecules connected by chemical reactions. More 
specifically, the cell control involves the activity of sev- 
eral related genes through gene networks, with the rela- 
tionship among them being generally broadly unknown. 
The inference or reverse-engineering of such gene 
networks is very important to uncover the functional 
relationship among genes and can be defined as the 
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identification of gene interactions from experimental 
data through computational analysis [1]. 

Gene regulatory networks (GRNs) are used to indicate 
the interrelation among genes in the genomic level [2]. 
Such information is very important for disease treatment 
design, drugs creation purposes and to understand the 
activity of living organisms in the molecular level. In 
fact, there is a strong motivation for the inference of 
GRNs from gene expression patterns, e.g., motivating 
the DREAM project [3]. 

The development of techniques for sampling expres- 
sion levels of genes along time has increased the possibi- 
lity of important advances in the understanding of 
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regulatory mechanisms of gene transcription and protein 
synthesis. In this context, an important task is the study 
and identification of high-level properties of gene net- 
works and their interactions, without the necessity of 
low-level biochemical descriptions. It is not the objective 
of this work to analyze a detailed biochemical model. 
The objective is to recover the gene connections in a 
global and simple way, by identifying the most signifi- 
cant connections (relationships). 

While it is not possible to infer the network topology 
with great accuracy using only gene expression measure- 
ments mainly due to the short sample sets and the high 
system dimension, i.e., the number of genes, as well as 
its complexity [4], the use of such inferences can be 
very important for planning experiments and/or to 
focus in some small meaningful subgroups of genes, 
thus reducing the complexity of the problem. 

We are interested in the inference of network topol- 
ogy from temporal expression profiles by minimizing 
the conditional entropy between the genes, i.e., the gene 
entropy conditioned to the state of others genes. Given 
a gene, the idea is to set as predictors the genes that 
minimize its entropy. Therefore, the conditional entropy 
works as a criterion function which has to be mini- 
mized. As in a typical machine learning problem, the 
quality of the inference depends on the data and the cri- 
terion function. If the data is not representative, the 
obtained solution will probably not be a global mini- 
mum but a local one. Similarly, if the criterion function 
is not suitable, the solution can only partially satisfy the 
constraint imposed by the data or even represent a 
wrong solution. Of course, since the criterion function 
follows the properties of the entropy concept, a comple- 
tely wrong solution is not expected. In other words, if 
the observation of some genes reduces the uncertainty 
on the target gene, the prediction accuracy is improved. 
But it may not be the best or optimal one, which brings 
the question: what is the best entropy function for the 
inference of GRNs? 

In this paper we address this question by presenting a 
new criterion function for the inference of GRNs in 
order to introduce the sensibility of the minimum con- 
ditional entropy regarding its functional form. The gen- 
eralized entropy functional form proposed by Tsallis [5] 
is adopted, which not only recovers the Shannon form 
but also presents properties required by the Statistical 
Physics Theory. These properties are related to Thermo- 
dynamics principles, to the concept of stability and its 
axiomatic foundations [6]. 

A variety of statistical methods to infer network topol- 
ogy has been applied to gene expression data [1,7-20]. 
The results are often evaluated by comparing predicted 
couplings with those known from biological databases. 
While this procedure can elucidate or corroborate 



inferred interactions between some couples of genes, it 
has the drawback of the difficulty in estimating the false 
detection rate [4] and thus making the validation pro- 
cess very difficult. As it is not always possible to assure 
the quality of inference methods by analytical calculus, 
mainly in high complex systems, it is very important to 
use computational experiments to do it. Besides, in such 
experiments (simulations) it is possible to investigate 
prior information, as topology classes {e.g., random or 
scale-free networks), or the system dynamics. Therefore, 
an Artificial Gene Network (AGN) platform [21,22] and 
the DREAM4 in silico network challenge [3] are 
explored in the present paper in order to assess the 
GRNs inference process by generalized entropy intro- 
duced in the present paper. 

2 Results and Discussion 

2.1 Experiments 

In order to verify the effect of the entropic parameter q, 
we carried out inference experiments considering two 
types of network topologies: the uniformly-random 
Erdos-Renyi (ER) and the scale-free Barabasi- Albert (BA) 
models [23-25]. In the ER model each connection (edge) 
is present with equal probability, in such way that the 
probability distribution of the connectivity of the genes 
follows a Binomial or Poisson distribution, with mean = 
(/<). On the other hand, in the BA model the probability 
of a new node v ; be connected to the node v t is propor- 
tional to the connectivity of v b which produces a power- 
law in the probability distribution of the connectivity. 

The data set D T was generated according to Sec. 4.3.2 
with N = 100 (the number of genes). For each type of 
network model 10 sequences of 30 transitions starting 
from random initial states were generated, which are 
obtained by applying Boolean transition functions. Then, 
the 10 segments were concatenated into a single expres- 
sion profile, which was submitted to the network infer- 
ence method. The inference was made by means of 
Equation 6 with q varying from 0.1 to 3.1 in steps of 0.1 
and from 3.1 to 10.1 in steps of 0.5, i.e., the similarity 
between the source and the inferred AGN was calcu- 
lated to each q in this range. 

The similarity curves shown in Figure 1 were obtained 
by averaging 50 runs (different source networks) for 
each network model. In both network models improve- 
ments were observed in the similarity by ranging q, with 
the maximum {Similarity (A, B)) being reached by q * 1 
for all tried (/<). Besides, it can also be noted that the q* 
that maximizes the similarity seems to be almost inde- 
pendent of the network model and the average connec- 
tivity. Figures 2(a) and 2(b) show the boxplots of the 
similarity values for each q and k values. It is possible to 
notice a very small variation in the boxplots, indicating 
stable results for all q values. 
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(a) ER network model 
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(b) BA network model 

Figure 1 Similarity between AGNs and inferred networks as a 
function of the entropic parameter q. Similarity between source 
and inferred networks as a function of the entropic parameter q for 
each average connectivity (1 < (k) < 5): (a) average values for the 
uniformly-random Erdos-Renyi (ER) and (b) average values for the 
scale-free Barabasi-Albert (BA). The simulations were performed for 
100 genes (N = 100) and represent the average over 50 runs. 



In order to better investigate this behavior, Figure 3 
shows the normalized frequency curves of the best q for 
each gene in the sense of higher similarity. It is clearly 
observed that higher frequencies are concentrated in the 
range 2 < q < 3 for both network models and varied 
connectivity. This indicates and reinforces (Figure 1) a 
non-dependence on the topology network in the 
improvement of the inference by taking non-Shannon 
entropy (q * 1). 

In particular, considering the frequency curves in Fig- 
ure 3, the average q* was calculated for each network 
model given the average connectivity. These averages seem 
to be almost constant (around 3.20 for the ER model and 
3.23 for the BA model) as well as the qs with higher fre- 
quencies, i.e., maximum amplitude in the frequency curves. 



In order to confirm our findings, we also evaluate the 
behavior of the proposed methodology by using the 
DREAM4 in silico network challenge [3]. In this chal- 
lenge the time series data was considered, which pro- 
vides five different networks of size 10 and other five of 
size 100. The networks of size 10 have 5 different time 
series, while the networks of size 100 have 10 time ser- 
ies. Each time series has 21 time points generated from 
a differential equations model with noise. The DREAM4 
in silico network challenge has 5 and 10 time series with 
21 time points each, which were also concatenated to 
form a single expression profile, similarly to the previous 
case (AGNs). 

The same methodology was applied with the similar 
used parameters. Only one additional step was included 
for the quantization of the DREAM data. The proposed 
criterion function and the adopted methodology are 
based on entropy calculations, in which a step of data 
quantization may be required if the original input data 
is not discrete, is the case of DREAM data. The applied 
method for the quantization process is described in [26]. 
It was applied by considering 2 levels for networks of 
size 10 and 3 levels for networks of size 100. In this 
context, an integer value represents each quantization 
level used by the quantization process. For example, 2 
levels means that the quantized signal has only 0's and 
l's. Then, each quantized network signal was submitted 
to the same methodology adopted in the present pa-per. 
Figure 4 presents the average results obtained for each 
DREAM network size: 10 and 100. It is possible to 
notice an improvement on the similarities by varying 
the parameter q, in which the best results were obtained 
by q * 1 for the two network sizes. 

Figure 5 presents the normalized frequency, in which 
the q value was able to infer the best set of predictors 
(higher similarity) for each gene. The higher frequencies 
are concentrated in the range 2.2 < q < 4.1 for the 
DREAM network of size 10 and 3.2 < q < 5.5 for the 
DREAM network of size 100. Regarding the frequency 
curve in Figure 5, the average <f was calculated for each 
network size, being around 3.30 for the DREAM 10 and 
3.92 for the DREAM 100, which are similar to those 
presented for ER and BA networks, but with slightly 
higher value for DREAM 100 network. It is important 
to highlight the existence of a range of q values that 
produce better results, on average 2.5 < q < 3.5 (subex- 
tensive entropy). 

All experimental results confirm that the proposed 
criterion function can improve the accuracy of the infer- 
ence process, thus indicating that the network nonex- 
tensivity is an important matter of investigation for 
inference methods based on information theory. As a 
result, it achieved a better accuracy on the inference of 
GRNs from gene expression patterns. 
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Figure 2 Distribution of the similarity values between AGNs and inferred networks as a function of the entropic parameter q 

Distribution of the similarity values between source and inferred networks as a function of the entropic parameter q for each average 
connectivity (1 < (k) < 5): (a) boxplot for the uniformly-random Erdos-Renyi (ER) and (b) boxplot for the scale-free Barabasi-Albert (BA). The 
simulations were performed for 100 genes (N = 100) and represent the average over 50 runs. 
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(b) BA network model 

Figure 3 Frequencies where the q value appears in the better 
answer for AGNs. Frequencies where the q value appears in the 
better answer for each target gene and for each average 
connectivity (1 < (k) < 5): (a) uniformly-random Erdos-Renyi (ER) and 
(b) scale-free Barabasi-Albert (BA). The simulations were performed 
for 100 genes (N = 100) and represent the average over 50 runs. 



2.2 Discussion 

The use of the entropy or mutual information as a cri- 
terion function on the problem of network inference is 
not new and has been largely applied for the inference 
of GRNs in recent years [1,10,11,13,14,16,17,19,20]. 
This is explained by the possibility that some genes 
may be well predicted by observing states of other 
genes in a regulatory network, which makes the use 
of conditional entropies suitable. If the relationship 
between these genes are linear, a simple Pearson corre- 
lation analysis would be enough to get a good des- 
cription of the gene network. However, when the 
relationship between genes is not linear but it is 
described by functions of more than one predictor 
gene, it is expected that the inference by methods based 
on the entropy concept produces better results than 



1 




0.9 
0.8 
0.7 






0.6 






milari 






55 0.4 

0.3 






0.2 
0.1 






0 


).l 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 

Tsallis Entropies (q) 




(a) DREAM 10 


0.9 
0.8 
0.7 






0.6 






Similar 






0.3 
0.2 






0.1 






0.1 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5 10 

Tsallis Entropies (q) 




(b) DREAM 100 


Figure 4 Similarity between DREAM and inferred networks as a 


function of the entropic parameter q. Similarity between gold 


standard and inferred networks as a function of the entropic 


parameter q, by considering the average results over the five 


DREAM networks available of each size: 10 and 100 genes. 



those based on Pearson correlation. Naturally, this leads 
to the necessity of investigating the sensibility or 
robustness of these methods with respect to the extern 
sivity of the applied entropy. In this context, it was veri- 
fied in a previous work [27] that the entropic parameter 
q was very important to achieve better results in the 
GRNs inference process. In the present work, we intro- 
duce a criterion function by adopting the generalized 
Tsallis entropy in order to verify the dependence of the 
inference on the entropy functional form and character- 
ize how this dependence occurs. 

The experimental results provide more evidence about 
the sensibility of the inference process to the extensive/ 
nonextensive entropies. In addition, the experimental 
results indicate that the nonextensivity property of the 
entropy is an important factor to be investigated and 
explored in the GRNs inference process in order to 
improve its accuracy, thus opening new perspectives for 
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Figure 5 Frequencies where the q value appears in the better 
answer for DREAM networks. Frequencies where the q value 
appears in the better answer for each target gene, by considering 
the average results over the five DREAM networks available of each 
size: 10 and 100 genes. 



inference methods based on the entropy minimization 
principle. 

As expected, we observed different similarity scores 
for different entropic parameters q. The maximum simi- 
larity score for all tried network models was reached by 
q * 1, with an improvement of 20% compared to the 
similarity score for q = 1 (see Figure 1 and 4). In order 
to better visualize the relevance of this improvement, it 
is important to take a look closer on the correctly and 
incorrectly inferred edges. For a network with N genes, 
N 2 directed edges are possible when every node is con- 
nected to itself and to each other, (Qy = 1 for all 1 < i, j 
< N). As the simulations were made with 1 < (k) < 5, C 
was always a sparse matrix with the number of connec- 
tions between the genes given by T P + F N . 



Table 1 presents the best number of correctly and 
incorrectly inferred edges by considering each gene indi- 
vidually. It is possible to observe a very good accuracy 
of recovering correct edges (T P and F P ) in the ER 
and BA model by adopting q = 2.5 (subextensive 
entropy). In this context, the recovery of false connec- 
tions (FP) seems to be dependent of the best entropy 
functional form. On the other hand, the network model 
does not seem to be dependent. Therefore, in order to 
improve the inference it is necessary to introduce infor- 
mation about the class model in the method. Further- 
more, another observed property that does not depend 
on the network class model is the reduction on the 
number of inferred false connections (FP), i.e., when the 
algorithm infers a connection that does not exist 
between a pair of genes. This indicates a more conserva- 
tive inference when an adjusted q is used, even for net- 
works with high connectivity - the number of FP 
connections for (k) = 5 obtained by the Shannon 
entropy was more than six times greater than that 
obtained by the generalized entropy with the adjusted 



Table 1 The best results found for q = 2.5 compared with 
q = 1.0. 
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(b) BA network model 
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The best results found for q = 2.5 compared with q = 1.0 by considering each 
gene individually in the same network: (a) uniformly-random Erdos-Renyi 
model (ER) and (b) scale-free Barabasi-Albert network topology (BA). 
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q - 2.5 for the BA networks and more than eight times 
greater for the ER networks. 

It was also observed that distributions with mass con- 
centrated in one of the classes are less penalized by 
applying q values near to 2.5. By considering that the 
system (organism) has a stochastic behavior and can 
receive external perturbations, it is expected that the 
class distributions are not deterministic among the pos- 
sible classes, i.e, in binary case 0 or 1. In other words, 
given the nature of the system it is desirable from 
method to infer connections from classes with concen- 
trated distributions and few errors among its classes 
(Table 2(b)) compared to more uniform distributions in 
one of the classes and no errors in the other (Table 2 
(a)). An important observed issue is that subextensive 
entropies, e.g., q values near to 2.5, promote this prefer- 
ence in the presented inference method. Table 2 shows 
an example of probability distribution that illustrates 
this situation. The predictor states are on the first col- 
umn and the number of observed states for the target 
states on columns two and three, thus generating a 
mass probability distribution table for a target gene by 
observing its predictor states. In columns four, five and 
six we have the criterion function results (conditional 
entropy) for each distribution by using different q 
values. The mean conditional entropy results marked 
with * represent the minimal achieved by the method, 
and therefore selected as predictor for the target by the 
inference method. 

As we can see, the minimum criterion function score 
changes with q and so the gene will be selected as pre- 
dictor. For q = 0.5 and 1.0 the method selected gene 
A as best predictor, while gene B is selected for q = 2.5. 
For almost probable states, the derivative of the 



Table 2 Example of change on the inferred predictor by 
using different values for q entropic parameter. 



(a) 




Target 


Criterion Function Results 


Predictor A 


0 1 


q = 0.5 


q = 1 


q = 2.5 


0 


18 23 


0.108 


0.090 


0.056 


1 


278 0 


0 


0 


0 


mean conditional entropy 




0.108* 


0.090* 


0.056 


(b) 




Target 


Criterion Function Results 


Predictor B 


0 1 


q = 0.5 


q = 1 


q = 2.5 


0 


1 16 


0.024 


0.013 


0.005 


1 


295 7 


0.265 


0.104 


0.036 


mean conditional entropy 




0.289 


0.117 


0.041* 



Example of change on the inferred predictor by using different values for q 
entropic parameter: (a) distributions that lead to wrong predictor and (b) 
distributions that lead to correct predictor. 



generalized entropy increases as q decreases (see Fig- 
ure 6). This behavior allows S q (target\B = 1) to be sig- 
nificantly greater than S q {target\A = 1) depending on q. 
In this context, distributions concentrated in one of the 
classes (few errors) can produce higher conditional 
entropy values, which can be very amplified by the pre- 
dictor distribution mass. Therefore, when q = 0.5 or 1.0 
the method selects the predictor gene A since it induces 
a null entropy on the target (when A is active), besides 
the high entropy on the target induced when it (gene A) 
is inactive. However, when q is set to 2.5 (subextensive 
entropy) the balance between the conditional entropy 
and the predictor probability mass is adjusted in order 
to produce better accuracy on the inference process. 

In summary, this situation characterizes how the subex- 
tensive entropy (q = 2.5) produces better results. In this 
example, it is considered a single target gene with a fixed 
number of time points on its expression data. Hence, 
Table 2(a) and 2(b) characterize two conditions of fre- 
quencies distribution that produce different predictors for 
the same target gene by using different values of q, in 
which q = 2.5 (subextensive entropy) achieves the correct 
predictor for the target. This example illustrates the 
trade-off between the conditional entropy of the target 
and the probability distribution of the predictor. 

Tables 1(a) and 1(b) present the results obtained by a 
single value of the entropic parameter q = 2.5, in order 
to show how the improvements are achieved by fixing 
q value on the range 2.5 < q < 3.5 (subextensive 
entropy). However, the main point in the Tsallis Theory 
is that there is not an universal q that should be used 
on every data set. The optimal q should be set by the 
system (or kind of systems), e.g., we have observed that 




P(vi = 1) 

Figure 6 Generalized entropy S q as a function of the 
probability = 1). Generalized entropy S q as a function of the 
probability P{v-, = 1) for binary genes, v-, e {0, 1}. From up to bottom, 
the curves were obtained with q set to: 0.1, 0.25, 0.5, 1.0, 1.5, 2.0, 
3.0, 4.0, 5.0. 

I ) 
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for Boolean networks this value was found close to 2.5 
and 3.5 for the DREAM networks. If we pay attention to 
the Figures 2(a) and 2(b), it will be noted that not only 
the averaged similarity is improved by considering q = 
2.5 instead of q = 1, but also the best and worst infer- 
ences (the highest and lower line in the boxplot) 
obtained in the sample dataset. Besides, it can also be 
observed the variance in the similarity is almost con- 
stant with respect to q (q = 1 and q = 2.5) for low levels 
of connectivity (small k) and reduced for high levels of 
connectivity (large k) when q = 2.5. 

An important property of the GRNs inference method 
regards stability. The boxplots results shown in Figures 
2(a) and 2(b) present very small variations for all tested 
q values. These results are an important indicative of 
the stability of the proposed methodology. 

3 Conclusions 

In general, reverse-engineering algorithms using time 
series data need to be improved [1]. The present work 
opens new perspectives for methods based on informa- 
tion theory, in face of all results discussed which show a 
relevant improvement on the inference accuracy by 
adopting nonextensive entropies proposed by Tsallis. In 
particular, the subextensive entropies provide a remark- 
able improvement of accuracy by reducing the number 
of false connections detected by the method. The 
obtained experimental results showed the importance of 
the range of values 2.5 < q < 3.5 (subextensive entropy). 

An interesting point regards the logic circuits created 
by Boolean functions and its dynamics. The inference 
method finds some of them independent of the q value, 
while others are found by tuning this parameter, as pre- 
sented in the previous section. Future works should 
investigate the Boolean functions or logic circuits that 
are sensitive to entropic parameter q and the local struc- 
tures formed by them. 

The inference algorithm and criterion function 
described in this work were implemented and included 
in the DimReduction software [26], which is freely avail- 
able at http://sourceforge.net/projects/dimreduction and 
http://code.google.eom/p/dimreduction/. 

4 Methods 

4.1 Selecting predictors by conditional entropy 

The mutual information may be understood as a mea- 
sure of the dependence between variables, with this 
dependence being quantified by calculating the average 
amount in the uncertainty on some variable v t given the 
knowledge about other accessible variable v h and vice- 
versa. In this sense, the mutual information indicates 
how much the prediction error of the state of v t changes 
if we know the state of v k . 



Given two random variables v t and v k , their mutual 
information can be calculated by [28]: 



Vi v k 

S(vi) - S[vi\Vk), 



P{Vi, Vk) 

P(yi)P(yk) 



(1) 



where 



Sfa) = -£>(vO In P(i/<), 

Vi 

S{Vi\v k ) = -^PW^P^il^lnP^I^) 

Vk Vi 

are the Boltzmann-Gibbs entropy of the gene i and its 
conditional entropy on the gene v h also known as the 
Shannon entropy and its conditional entropy, respectively. 

If the states of the genes taken into account in Equa- 
tion 1 are collected in distinct times, i.e., V/(£+l) and v k (t), 
the mutual information can be used to select predictor 
genes (v k (t)) as those that minimize the uncertainty on 
the target gene (v/(£ + 1)). Thus, the method consists in 
finding the gene v k that maximizes Equation 1 for a given 
target gene v if which is equivalent to find the gene v k that 
minimizes the conditional entropy S(Vi(t + 1)| V A:W)« 
Despite the symmetry in I(Vi(t +1), v k (t)) with respect to 
the variables v t {t + 1) and v k (t), since the state variables 
computed in it belong to different time instants, t and 
t + 1, it is possible to infer a causality between v t (t + 1) 
and v k {t). As 7(v/(£ + 1), v k (t)) is not necessarily equal to 
I(v k (t + 1), Vi(t)) t this causality can be estimated by the 
difference between 7(v/(£+l), v k (t)) and I(v k (t+1), v^t)) or, 
in a simple way, by S(v/(£ + l)\v k {t)). 

Naturally, the mutual information is not restricted to 
pairs of genes and we can use it to infer the dependence 
of Vi on groups of genes: 7(v/(£ + 1); {vj ...v k }(£)) = S(vi(t 
+ 1)) - S(vi(t + l)|{v ; - ...v k }(£)). Therefore, given a set D 
of temporal gene expression profiles from a network, 
the method looks for the group of genes that maximizes 
Equation 1 for each gene. If 7(v/(£ + 1); {v ; ...v k }(t)) pre- 
sents the maximum score calculated from D, then each 
gene of {v ; ...v k } is directly connected to v t as predictor. 
In the same way, if there is not a group that causes sig- 
nificantly variations on the mutual information, then v t 
is selected as a source or an isolated gene (in the case 
that Vi is not selected as a predictor of any gene). Once 
the method is applied to each gene individually, the 
individual entropy of the target v t (S(v/(£ + 1))) is kept 
constant during the search for predictors, and as a result 
the method returns as predictors the genes that produce 
the lowest conditional entropy (S(Vi(t +l)|{v ; ...v k }(£))). 
In other words, the mutual information can be calcu- 
lated by the difference between the individual entropy S 
(Vi(t + 1)) and the mean conditional entropy S(Vi(t + 1)| 
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{Vj ...v k }(t)), by considering a group of genes g(t) = {v ; ... 
v k }(t). Therefore, the difference between I(v b v k ) and I(v b 
g) is due to the mean conditional entropy, once the indi- 
vidual entropy of v b S(Vi), is exactly the same in both 
I(v b v k ) and l{y b g). 

4.2 Beyond the Boltzmann entropy 

The concept of entropy was introduced by Clausius in 
the context of Thermodynamics considering only 
macroscopic statements [29]. Motivated by the idea of 
relating it to the Classical Mechanics some years later, 
Boltzmann showed that this entropy could be expressed 
in terms of the probabilities associated to the micro- 
scopic configuration of the system [30]. However, in his 
mathematical demonstration there were some considera- 
tions about the nature of the physical system to assure 
the recovery of the properties of Clausius macroscopic 
entropy by his microscopic approach - e.g., short-range 
interactions, a necessary condition to assure the exten- 
sivity of the Boltzmann entropy [6,31]. Thus, despite the 
great importance and success of the Boltzmann entropy, 
there are situations were such conditions are not pre- 
served [32] and Boltzmann entropy will hardly recover 
the properties of the Clausius entropy. 

Inspired by the probabilistic description of multifractal 
geometry, C. Tsallis proposed in 1988 a generalization 
of the Boltzmann entropy [5] which, along two decades, 
has been successful in presenting desired properties of 
Statistical Physics Theory [6,33] with great experimental 
accordance [31]. 

The proposed definition is [5] 



where k is a positive constant (which sets the dimen- 
sion and scale), w is the number of distinct configura- 
tions of the system, p t is the probability of such 
configuration and q e is the entropic parameter. 

The entropic parameter characterizes the degree of 
nonextensivity, which in the limit q — > 1 recovers 
S = —kj^i pilnpjk with k being set to the Boltzmann 
constant 

The Boltzmann-Gibbs entropy is said to be extensive 
in the sense that, for a system consisting of N indepen- 
dent but equivalent subsystems v = {v 1} v 2 > Vn}> the 
entropy of the system is given by the sum of the entropy 
of the subsystems: S(v) = NS(vi) [31]. In the Tsallis 
entropy, this extensivity is set by the parameter q, which 
can be clearly visualized by the compound rule [31]: 

S q (A,B) = S q (A) + S q {B) + (1 - q)S q {A)S q {B) f (3) 

with A and B being independent systems, i.e., P{A,B) = 
P(A)P(B). We can observe superextensivity for q <1, 



extensivity for q = 1 and subextensivity for q >1. More 
specifically, S q is always nonnegative for q >0. Although 
it is also possible to have S q >0 for some q <0, q >0 is 
generally used to avoid divergences or some inconsisten- 
cies [31]. 

Equation 2 has been largely applied to different physi- 
cal problems, e.g., http://www.cbpf.br/GrupPesq/Statisti- 
calPhys/biblio.htm for a large bibliography, leading to 
good agreements with experimental data. Naturally, 
despite these applications, it can be asked if the Tsallis 
entropy is also suitable to code information in a general 
way such as Shannon [34], Khinchin [35] and Kullback 
[36] showed to be the Boltzmann entropy. Some papers 
have been published verifying the mathematical founda- 
tion of the Tsallis entropy, similarly to the axiomatic 
approach used by Khinchin [37,38], as well as investigat- 
ing its nonaddictive features and their interpretations 
[6,39]. As in typical physical problems, there are some 
examples where the Boltzmann-Shannon entropy is not 
suitable [40]. Besides, it is also possible to define a 
divergence equivalent to the Kullback-Leibler [41]. 

By defining ln q (x) = (x 1 ~ q -1)/(1 - q), Equation 2 can be 
written in a similar form of the Boltzmann entropy 
Sq = — fej^r P^ n cjPi' I n this wa Y> a generalized mutual 
information between v t and v k can be defined as [41]: 

^;v fe ) = EE^*(^g^). (4) 

The generalized mutual information has the necessary 
properties to be used as a criterion measure for consis- 
tent testing [42] and, as Equation 1, it reaches its mini- 
mum value when P(vi\v k ) = P(vi) and the maximum 
when - J2vi P{ v i\ v k)^qP{yi\Vk) vanishes [41], which is 
equivalent to make - J2 Vi ,v k p { v k)P{vi\vk) q ^qP{vi\V}i) van- 
ish. It is hence possible to look for dependencies 
between v t and v k by minimizing S q (vi\v k ). 

For binary genes, v t e {0, 1}, we have S q {Vj) = [P{vi = 
l) q + (1 - P(Vi = l)) q -1]/(1 - q) and the influence of the 
entropic parameter q can be easily observed. In Figure 6 
the maximum entropy for the gene increases as q 
decreases, taking the limit S™ ax = 1 as q — > 0. Indeed, 
when q ~ 0, S q {v t ) will be significantly different of S™ ax 
for P(vi = 1) ~ 0 or P(vi = 1) ~ 1, which means a very 
rigid criterion in the sense that, either the predictor can- 
didates fulfill all the constraints imposed by the data or 
they can not be selected as predictors. On the other 
hand, S™ ax -> 0 for q » 1 which can be interpreted as a 
very flexible criterion function in the sense that any 
gene or group of genes can be selected as good 
predictors. 

Another interesting point is the ordering of the 
entropy with respect to P(v t = 1). If the entropy of P(v t 
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= 1) = a is larger than the entropy of P(v t = 1) = b for a 
given q*, then it will always be large for any q - see Fig- 
ure 6. But this ordering is not preserved on the mean 
conditional entropy. For S q (Vi\v k ) the entropy of v t given 
v k is weighted by the probability of v k , 



(5) 



in such way that it is possible to have 
StfiVilVk) > S*,(vi\Vk) and S^(v,-|vfe) < S*,,(i/j|vfe) for some 
# * and where the index a represents the constraint 
{P{Vi = l\v k = 0) = a 0 , P{Vi = l\v k = 1) = a ± } and 6 the 
{P(v, = l\v k = 0) = b 0 , P(Vi = l\v k = 1) = This results 
in a trade-off between the relevance of the conditional 
entropy and the probability distribution of the predictor 
genes. 

In the context of feature selection or dependence vari- 
ables test, in which the entropy is used as a criterion 
function, this non-preservation of the ordering means 
the existence of an optimal q* by which a system can be 
best reproduced. As in physical problems, cf should be 
related to the system properties [31] and discovering the 
laws or principles which relate q* to these properties 
becomes fundamental to improve recovering methods. 

4.3 Proposed Method 

The algorithm is based on previous works [8,11], which 
consists in looking for the group of genes that mini- 
mizes the criterion function (i.e., conditional entropy) of 
the target gene. Therefore, for each given target v t we 
have to calculate the conditional probabilities P(v;(£+l)| 
Vj(t), v k {t)) based on the data set D T = {s(l), 5(2), s 
(T )}, where s{t) = [vi(t)> v 2 (i), v^t)] is the expression 
vector at time t, i.e., the state of the network at time t. 

For a network with N genes we have 
n p = J2x=i N\/x\(N — x) \ conditional probabilities to be 
calculated for each gene, i.e., n p possible groups of pre- 
dictors. Fortunately, it is not expected that the genes are 
regulated for many predictors [43,44] and an upper 
bound for n p can be defined. Kauffman observed that 
chaotic dynamics are more probable for gene networks 
with n p > 3 [43,44] and by stability principles he con- 
cluded that the average connectivity should be upper 
bounded by three, once the gene network could be in 
the frontier of chaos but not chaotic. Herein, we relax a 
little the Kauffman statement and set this upper bound 
on the average connectivity (k) < 5. 

Another important point is the possibility of gene net- 
works with different topology classes. In order to verify 
the sensibility of the method on the topology class, the 
topology of gene networks were generated with the uni- 
formly-random Erdos-Renyi (ER) [45] and with scale- 
free Barabasi-Albert (BA) [46] models. The BA complex 



network model is one of the most similar to known real 
regulatory networks [47,48]. Biological network topolo- 
gies based on Escherichia coli and Saccharomyces cerevi- 
siae [49] were also considered. 

We describe below how the artificial gene networks 
were generated, the algorithm of inference, evaluation 
metrics and the experimental results. 
4.3. 1 The inference algorithm and criterion function 
Given the temporal data D T the algorithm fixes a gene 
target v t and looks for the group of genes g that mini- 
mizes the conditional entropy S q (Vi(t + l)\g(t)) for a 
fixed q. As the network size is generally high, the search 
space becomes very high such that an exhaustive search 
is not appropriate. Then, we apply the Sequential For- 
ward Floating Search (SFFS) [50] to circumvent this 
combinatorial explosion. 

For the calculation of the conditional entropy (Equa- 
tion 5) it is necessary to estimate the conditional prob- 
abilities of the target given its predictor candidates as 
well as the probabilities of these candidates. In the 
absence of prior information, these probabilities are esti- 
mated by the relative frequencies on D T , which means 
an accuracy dependence on the representativity of D T . 
Once we are searching for the lower entropy, it is not 
recommended to set the probability of non-observed 
instances as null. It is possible that some of the 
instances are not present in the temporal expression 
profile because of its small size sample and/or by the 
dynamics of the system, i.e., the transition functions. 
Therefore, in order to reach a good trade-off we follow 
the penalization of non-observed instances [26,51]. The 
penalized criterion function by adopting the generalized 
Tsallis entropy is defined as follows: 



2=1 
in 



y r g + a l-Ev,P("ilg)* 
~ am + d a — 1 



(6) 



a(m — n) 



Sq{Vi) + 



am + a 

y r g + a l-j: Vi P(Vi\g)" 
^ am + d a — 1 

where a > 0 is the penalty weight, m is the number of 
possible instances of the gene group g (predictors), n is 
the number of observed instances, d is the total number 
of samples and r g is the number of each observed 
instance of g. 

If a is set to zero, we do not have any penalization and 
P(g) is estimated by its relative frequency on D T , i.e., by 

calculating the terms r g /d (j^ g r g = d^. When n = m, the 
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penalization term, first term in Equation 6, is canceled and 
P(g) is now estimated by a modulated relative frequency of 
the predictors, by adding a to all instances of g, i.e., 



and finally when n < m, the parameter a is considered 
m - n times for non-observed instances (sum), and n 
times for observed instances. Thus, in Equation 6 a 
positive mass of probability is assigned to the non- 
observed instances of the gene group g in the expression 
data, which is parameterized by a. 

Furthermore, the penalization of the non-observed 
instances is weighted by the entropy of the target gene, 
i.e., S q (Vi). This is important because of the possibility of 
having a good description about a gene when its uncer- 
tainty is small, i.e., the observed instances of the genes 
are enough to describe the dynamics of a target gene 
with small entropy. In this paper we set a = 1. 

The inference algorithm consists in calculating the 
mean conditional entropy by using Equation 6 and look- 
ing for a group of genes that minimizes it. This search is 
performed by the SFFS algorithm. 
4.3.2 Artificial gene networks 

The adopted AGN model was built based on the random 
Boolean network (RBN) approach [43,44,52]. This model 
yields insights into the overall behavior of large gene net- 
works, allows the analysis of large data sets in a global 
way and represents some fundamental characteristics of 
real GRNs [53-57]. In a RBN model, the state of each 
gene is a binary variable set by Boolean functions of 
other genes. The possibility to model GRNs as Boolean 
networks stems from the switch-like behavior that the 
cell exhibits during regulation of functional states 
[52,58]. In this context, the gene state is mapped from a 
continuous expression to a two-level expression (on/off). 

More specifically, an artificial gene network (AGN) is 
defined by a set V - {v v v 2 , v N } of N genes (nodes), a 
N x N adjacency matrix C (with Qy e {0, 1}) and a set F = 
{fi>f2> --'/n} of N transitions functions. In the Boolean 
approach, each^ is a logical circuit of the non-null ele- 
ments of the i th row of C that sets the state of the gene v t . 
Then, the network state at time t + 1 is a N-dimensional 
vector s(t + 1) = [vi(t + 1), v 2 (t + 1), + 1)] resulting 
from the application of these functions to the previous 
state s(t). Besides, the connectivity of v t is given by 
fej = Y^\=\ [Cij + QO an d the topology class of the network 
is defined by the probability distribution of these 
connectivities. 

The networks used in this paper were obtained by the 
network generator proposed in [21,22]: 

1. define a topology class, i.e., the distribution P(k) of 
the number k of connections per gene; 



2. define the k t connectivity for each gene v t setting 
the predictors (C ; /s) and targets (C# 's) by using the 
P(k) distribution; 

3. set the transfer function/ for each gene v t by ran- 
dom drawing a truth table according to its number 
of predictors (rip = Yjii Qt)> an output state for 
each one of the 2 n p input states. 

Once defined the AGN, the simulated temporal 
expression profile D T is obtained by defining an arbi- 
trary initial state of the network and successive applica- 
tions of the transfer functions. 

On the other hand, DREAM4 temporal expression 
profiles were generated by considering network struc- 
tures based on Escherichia coli and Saccha-romyces cere- 
visiae [49]. The dynamics was generated by continuous 
differential equations with the inclusion of perturbations 
on the data in order to simulate a physical or chemical 
intervention. Gaussian noise was also added in order to 
simulate expression measurement error. In summary, 
the DREAM4 time series database presents variations of 
network size with 10 and 100 genes, perturbation and 
noise on expression profiles generated by differential 
equations. A detailed description is provided in the 
DREAM project website [3]. 

In both cases (AGN and DREAM network), the tem- 
poral expression profile D T is submitted to the inference 
method and its results are evaluated according to the 
measures presented in the next section. 
4.3.3 Evaluation 

In order to quantify the similarity between the source 
gene network A and the inferred network B, we adopted 
the validation metric based on a confusion matrix [59] 
(see Table 3). 

The networks are represented in terms of their respec- 
tive adjacency matrices C, such that each connection 
from gene i to gene ; implies Qj = 1, with Qj = 0 other- 
wise. Then, in order to quantify the quality of the 
inferred network, the similarity measurements [60] 
widely used to compare inference methods were 
adopted, being calculated as follows: 



Similarity (A, B) = y/PPV x Specificity, 




TP 

PPV = 

[TP + FP) f 


(7) 


Specificity = . 

H 3 r (TN + FP) 




Table 3 Confusion matrix. 




Edge/Connection Inferred in B 


Not Inferred in B 


Present in A TP 


FN 


Absent in A FP 


TN 



Confusion matrix. TP = true positive, FN = false negative, FP = false positive, 
TN = true negative. 
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Since the measurements on Equation 7 are not inde- 
pendent of each other, it was adopted the geometrical 
average between the ratios of correct ones PPV (Positive 
Predictive Value, also known as accuracy or precision) 
and correct zeros (Specificity), observing the ground- 
truth matrix A and the inferred matrix B. In this way, 
both coincidences and differences are taken into 
account by these measures, thus implying the maximum 
similarity to be obtained for values near 1. 
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